DDSS: A Low-Overhead Distributed Data Sharing Substrate for Cluster-Based Data-Centers over Modern Interconnects
نویسندگان
چکیده
Information-sharing is a key aspect of distributed applications such as database servers and web servers. Information-sharing also assists services such as caching, reconfiguration, etc. In the past, information-sharing has been implemented using ad-hoc messaging protocols which often incur high overheads and are not very scalable. This paper presents a new design for a scalable and a low-overhead Distributed Data Sharing Substrate (DDSS). DDSS is designed to support efficient data management and coherence models by leveraging the features of modern interconnects. It is implemented over the OpenFabrics interface and portable across multiple interconnects including iWARP-capable networks in LAN/WAN environments. Experimental evaluations with networks like InfiniBand and iWARP-capable Ammasso through data-center services show an order of magnitude performance improvement and the load resilient nature of the substrate. Application-level evaluations with Distributed STORM achieves close to 19% performance improvement over traditional implementation, while evaluations with check-pointing application suggest that DDSS is highly scalable.
منابع مشابه
Benefits of Dedicating Resource Sharing Services in Data-Centers for Emerging Multi-Core Systems
Distributed applications tend to have a complex design due to issues such as concurrency, synchronization and communication. Researchers in the past have proposed simpler abstractions to hide these complexities. However, many of the proposed techniques use messaging protocols which incur high overhead and are not very scalable. To address these limitations, in our previous work [20], we propose...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملDesigning High Performance and Scalable Distributed Datacenter Services over Modern Interconnects
Modern interconnects like InfiniBand and 10 Gigabit Ethernet have introduced a range of novel features while delivering excellent performance. Due to their high performance to cost ratios, increasing number of datacenters are being deployed in clusters and cluster-of-cluster scenarios connected with these modern interconnects. However, the extent to which the current deployments manage to benef...
متن کاملEntropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملAccelerating Complex Data Transfer for Cluster Computing
The ability to move data quickly between the nodes of a distributed system is important for the performance of cluster computing frameworks, such as Hadoop and Spark. We show that in a cluster with modern networking technology data serialization is the main bottleneck and source of overhead in the transfer of rich data in systems based on high-level programming languages such as Java. We propos...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006